Search CORE

63 research outputs found

A Fast Cache-Oblivious Mesh Layout with Theoretical Guarantees

Author: Danjean Vincent
Raffin Bruno
Tchiboukdjian Marc
Publication venue: HAL CCSD
Publication date: 07/06/2008
Field of study

International audienceOne important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches. However, these algorithms are based on heuristics. We propose in this paper a new CO algorithm for meshes that has both a low theoretical complexity and proven quality. We guarantee that a coherent traversal of an N-size mesh in dimension d will induce less than N/B+N/M^{1/d}) cache misses where B and M are the block size and the cache size. We compare our layout with previous ones on several 3D meshes

INRIA a CCSD electronic archive server

Improving Reactivity to I/O Events in Multithreaded Environments Using a Uniform, Scheduler-Centric API

Author: Bougé Luc
Danjean Vincent
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed systems. In our scheduler-centric approach, an application detects I/O events by requesting a service from a detection server, through a simple, uniform API. We show that a good choice for this detection server is the thread scheduler. This approach simplifies application programming, significantly improves performance, and provides a much tighter control on reactivity

INRIA a CCSD electronic archive server

Binary Mesh Partitioning for Cache-Efficient Processing

Author: Danjean Vincent
Raffin Bruno
Tchiboukdjian Marc
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but lack of theoretical performance guarantees. We present in this report a O(N log N) algorithm to compute CO layout for unstructured meshes. We prove that a coherent traversal of a N-size mesh in dimension d will induce less than N/B+O(N/M^{1/d}) cache-misses where B and M are the block size and the cache size. Experiments show that our layout computation is faster and significantly less memory consuming than for the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better if the access pattern is adapted to the binary mesh partitioning produced by the algorithm. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures

INRIA a CCSD electronic archive server

X-Kaapi C programming interface

Author: Danjean Vincent
Gautier Thierry
Le Mentec Fabien
Publication venue: HAL CCSD
Publication date: 01/12/2011
Field of study

This report defines the X-Kaapi C programming interface.The rapport d ́ecrit l'interface de programmation C pour X- Kaapi

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

The X-Kaapi's Application Programming Interface. Part I: Data Flow Programming

Author: Danjean Vincent
Gautier Thierry
Le Mentec Fabien
Publication venue: HAL CCSD
Publication date: 01/12/2011
Field of study

In this report, we present X-Kaapi's programming model. A X-Kaapi parallel program is a C or C++ sequential program with code annotation using #pragma compiler directives that allow to create tasks. A specific source to source compiler translates X-Kaapi directives to runtime calls.Ce rapport présente le modèle de programmation X-Kaapi qui permet d'annonter un programme séquentiel écrit en C ou C++ par des directives de compilation #pragma afin de décrire simplement les tâches du programme. Un compilateur source à source génère un code qui permet, grâce au runtime X-Kaapi, d'extraire à l'exécution ce graphe de flot de données, y compris pour les programmes récursifs dont les tâches seront générées récursivement

INRIA a CCSD electronic archive server

An Efficient Multi-level Trace Toolkit for Multi-threaded Applications

Author: Danjean Vincent
Namyst Raymond
Wacrenier Pierre-André
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

International audienceNowadays, observing and understanding the behavior and performance of a multithreaded application is nontrivial, especially within a complex multithreaded environment such as a multilevel thread scheduler. In this report, we present a trace toolkit that allows a programmer to precisely analyze the behavior of a multithreaded application. A application's run generates several traces that are merged and analyzed offline. The resulting super-trace contains not only classical information such as the number of elapsed cpu cycles per functions but also details about thread scheduling at multiple levels

INRIA a CCSD electronic archive server

IV Grid Plugtests: composing dedicated tools to run an application efficiently on Grid'5000

Author: Besseron Xavier
Danjean Vincent
Gautier Thierry
Guelton Serge
Huard Guillaume
Wagner Frédéric
Publication venue
Publication date: 12/02/2008
Field of study

Exploiting efficiently the resources of whole Grid'5000 with the same application requires to solve several issues: 1) resources reservation; 2) application's processes deployment; 3) application's tasks scheduling. For the IV Grid Plugtests, we used a dedicated tool for each issue to solve. The N-Queens contest rules imposed ProActive for the resources reservations (issue 1). Issue 2 was solved using TakTuk which allows to deploy a large set of remote nodes. Deployed nodes take part in the deployment using an adaptive algorithm that makes it very efficient. For the 3rd issue, we wrote our application with Athapascan API whose model is based on the concepts of tasks and shared data. The application is described as a data-flow graph using the Shared and Fork keywords. This high level abstraction of hardware gives us an efficient execution with the Kaapi runtime engine using a work-stealing scheduling algorithm to balance the workload between all the distributed processes

Open Repository and Bibliography - Luxembourg

Detecção de Anomalias de Desempenho em Aplicações de Alto Desempenho baseadas em Tarefas em Clusters Híbridos

Author: Danjean Vincent
Legrand Arnaud
Mello Schnorr Lucas
Pinto Vinicius,
Stanisic Luka
Thibault Samuel
Publication venue: HAL CCSD
Publication date: 23/07/2018
Field of study

National audienceProgramming paradigms in High-Performance Computing have been shifting towards task-based models which are capable to more readily adapt to heterogeneous and scalable supercomputers. Detecting performance anomalies in such environments is particularly difficult since it must consider architecture heterogeneity, variability, and the capability to obtain trusted measurements. This work presents a case-study about the detection of anomalies in the execution of the well-known tiled dense Cholesky factorization developed with StarPU. Our experiments have been conducted in a variety of hybrid multi-node platforms to demonstrate how we are capable to detect and highlight performance anomalies.Os paradigmas de programação em Computação de Alto Desempe-nho estão mudando para modelos baseados em tarefas que são capazes de se adaptar a supercomputadores com arquiteturas heterogêneas e escaláveis. A detecção de anomalias de desempenho em tal cenário é particularmente difícil uma vez que ela deve considerar a heterogeneidade da arquitetura, a variabili-dade e a capacidade de obter medições confiáveis. Este trabalho apresenta um estudo de caso sobre a detecção de anomalias na execução da conhecida fatora-ção de Cholesky por blocos desenvolvida com StarPU. Os experimentos foram conduzidos em uma variedade de plataformas com múltiplos nós híbridos para demonstrar a capacidade de detectar e destacar anomalias de desempenho

INRIA a CCSD electronic archive server

An Effective Git And Org-Mode Based Workflow For Reproducible Research

Author: Arnaud Legrand
Drummond C.
Luka Stanisic
Ruiz Sanabria C. C.
Vincent Danjean
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Adaptive and Hybrid Algorithms: classification and illustration on triangular system solving

Author: Cung Van-Dat
Danjean Vincent
Dumas Jean-Guillaume
Gautier Thierry
Huard Guillaume
Raffin Bruno
Rapine Christophe
Roch Jean-Louis
Trystram Denis
Publication venue: Copias Coca, Madrid
Publication date: 01/04/2006
Field of study

International audienceWe propose in this article a classification of the different notions of hybridization and a generic framework for the automatic hybridization of algorithms. Then, we detail the results of this generic framework on the example of the parallel solution of multiple linear systems

INRIA a CCSD electronic archive server